Julia supports multiple float-point types, including Float16 (half), Float32 (single) and Float64 (double). Half-precision floating-point numbers are only for storage format, when Float16 type is involved in computation, it will be automatically promoted into Float32.
In [1]:
@printf("size of Float64 1.0 is %d.\n", sizeof(1.0))
@printf("size of Float32 1.0 is %d.\n", sizeof(float32(1.0)))
@printf("size of Float16 1.0 is %d.\n", sizeof(float16(1.0)))
NaN and Inf (as well as -Inf) are special floating-points(IEC559), and can be cast into all floating-point types also be used in arithmetic operations.
In [2]:
@printf("Inf * Inf + Inf = %d\n", Inf * Inf + Inf)
@printf("Inf + (-Inf) = %d\n", Inf + (-Inf))
@printf("Inf * Inf + Inf = %lf\n", Inf * Inf + Inf)
@printf("Inf + (-Inf) = %lf\n", Inf + (-Inf))
Epsilon function eps gives machine accuracy. For single and double accuracy, epsilon will be float32(2.0^-23) and 2.0^-53 relatively. Notice eps also can be used on Float16, the result will be also a half precision number.
In [3]:
@printf("%4.52lf\n", eps(Float16))
@printf("%4.52lf\n" ,eps(Float32))
@printf("%4.52lf\n", eps(Float64))
The default rounding mode is RoundNearest, to change the mode, we can use with_rounding. In the following example, 1.2000000000000001 cannot be represented, thus Julia rounds it to the nearest representable floating-point number.
In [4]:
# unable to represent, use default rounding
@printf("%4.16lf\n", 1.2000000000000001)
# rounding up
with_rounding(Float64, RoundUp) do
@printf("%4.16lf\n", 1.1 + 0.1)
end
# rounding down
with_rounding(Float64, RoundDown) do
@printf("%4.16lf\n", 1.1 + 0.1)
end
For understanding of machine numbers, we take a look at IEEE standard for Single Precision(Float32), it consists of one bit of sign, and 8 bits of exponent bitstring, 23 bits of numerical value. Single Precision is ranged from -Inf32 to Inf32.
In [5]:
- typemax(Float32) == typemin(Float32)
Out[5]:
Julia provides bits representation of a floating-point number by using bits.
In [6]:
# Float32 type 1.000001 is first rounded to 1.0000009536743164
@printf("1.000001f0 is rounded to %4.16lf\n", 1.000001f0)
# And then its bit representation as 0 01111111 00000000000000000001000
@printf("its bits reprenstation is %s\n", bits(1.000001f0))
# The representation means it is positive, exponent as 127(convert to base 10).
# And numerical value part as 1 + 2^(-20).
# Thus the answer will be (1 + 2^(-20))*2^(127 - 127)
@printf("Convert to Float32 %4.16lf\n", float32(1 + 2.0^(-20)))
So according to the IEEE standard, the Inf32 is defined as 0 11111111 00000000000000000000000, if any of the last 23 bits is nonzero, then it is an NaN, Julia sets its system NaN as 0 11111111 10000000000000000000000.
In [7]:
# bits of Inf32
@printf("%s\n", bits(Inf32))
# bits of NaN32
@printf("%s\n", bits(NaN32))
In [8]:
# simple loop of summation over array(double precision)
function simple_sum(arr::AbstractArray, first::Int, last::Int)
b = arr[first];
for i = first + 1 : last
@inbounds b += arr[i]
end
return b
end
function pair_sum(arr::AbstractArray, first::Int, last::Int)
if first + 1024 >= last
return simple_sum(arr, first, last)
end
mid = (last + first) >>> 1;
return pair_sum(arr, first, mid) + pair_sum(arr, mid + 1, last)
end
function kahan_sum(arr::AbstractArray)
# preallocate memory
n = length(arr)
arr_i = 0.
if (n == 0)
return 0
end
@inbounds s = arr[1]
c = 0.
for i = 2:n
@inbounds arr_i = arr[i]
t = s + arr_i
if abs(s) >= abs(arr_i)
c += ((s-t) + arr_i)
else
c += ((arr_i-t) + s)
end
s = t
end
return s + c
end
Out[8]:
Consider the worst case of above routine simple_sum, the accumulative error would grow as O(eps n), where eps is the machine accuracy. On the other hand, the cost of computing is fine, O(n). However, we have other algorithms of summation, such as pairwise summation and Kahan summation. For pairwise summation, the algorithm is simply divide and conquer, however, this reduces the growth of error to O(eps log n), and for Kahan summation, it reduces the error to machine accuracy level as O(eps), however, this makes it slower than other methods.
For the cost of each routine, Kahan costs most, the others are the same as O(n), but pairwise(cascade) summation can be parallelized.
For the accuracy of each routine, we have to look deeper into this. For Kahan's method, since there is always compensation, we can always expect very accurate answer at each step in recursive summation, if performance is not our goal. However, for the simple summation method, the ordering of input array might play a great role in controlling the error.
First, we have to think about why summation of harmonic series converges on our fix-precision computer(maybe after a long time). When we are adding two numbers a + b, if |a| is much larger than |b|, then b probably could not contribute to the sum. Just like following case, 10^(-15) is too small for log(10^15).
In [9]:
log(10^15) + 0.000000000000001 - log(10^15) == 0.000000000000001
Out[9]:
Therefore, the ordering here really matters. If all numbers of the array A are postive, then ascending order will produce less error than other ordering, especially descending order.
In [10]:
rand_arr = rand(500000);
# use ascending order will give better result
sorted_rand_arr = sort(rand_arr,rev=false)
# simple loop
@printf("simple sum:\n")
@time @inbounds ret_1 = simple_sum(rand_arr, 1, length(rand_arr))
@time @inbounds sorted_ret_1 = simple_sum(sorted_rand_arr, 1, length(rand_arr))
@printf("\npairwise sum:\n")
# pairwise/ uses simple loop for small block
@time @inbounds ret_2 = pair_sum(rand_arr, 1, length(rand_arr))
@time @inbounds sorted_ret_2 = pair_sum(sorted_rand_arr, 1, length(rand_arr))
# Julia's mapreduce
@printf("\nJulia's mapreduce:\n")
@time @inbounds ret_3 = sum(rand_arr)
@time @inbounds sorted_ret_3 = sum(sorted_rand_arr)
# Kahan
@printf("\nKahan sum:\n")
@time @inbounds ret_4 = kahan_sum(rand_arr)
@time @inbounds sorted_ret_4 = kahan_sum(sorted_rand_arr)
@printf("\nunsorted simple sum is %4.16lf.\n",ret_1);
@printf("\nsorted simple sum is %4.16lf.\n",sorted_ret_1);
@printf("\nunsorted pairwise sum is %4.16lf.\n",ret_2);
@printf("\nsorted pairwise sum is %4.16lf.\n",sorted_ret_2);
@printf("\nunsorted Julia's sum is %4.16lf.\n",ret_3);
@printf("\nsorted Julia's sum is %4.16lf.\n",sorted_ret_3);
@printf("\nunsorted Kahan sum is %4.16lf.\n",ret_4);
@printf("\nsorted Kahan sum is %4.16lf.\n",sorted_ret_4);
Here we can see Julia has faster summation algorithm. Let's look at how Julia deals with summation. The following is simplified version of Julia's summation(refer base/reduce.jl) , the original code uses mapreduce to reduce temporary floating-point. The method used here is called Loop Unrolling. It is used for code optimization, trades the size of code with performance, by reducing instructions that control the loop and eliminating delay in communicating with memory.
The following code unrolling its loop into four-way summation, and achieved 2x applausible performance. However, even though the performance is pleasant, the error growth is still at level O(eps n).
In [11]:
function julia_sum(a::AbstractArray, ifirst::Int, ilast::Int)
@inbounds if ifirst + 6 >= ilast # length(a) < 8
i = ifirst
s = a[i] + a[i+1]
i = i+1
while i < ilast
s +=a[i+=1]
end
return s
else # length(a) >= 8, manual unrolling
@inbounds s1 = a[ifirst] + a[ifirst + 4]
@inbounds s2 = a[ifirst + 1] + a[ifirst + 5]
@inbounds s3 = a[ifirst + 2] + a[ifirst + 6]
@inbounds s4 = a[ifirst + 3] + a[ifirst + 7]
i = ifirst + 8
il = ilast - 3
while i <= il
@inbounds s1 += a[i]
@inbounds s2 += a[i+1]
@inbounds s3 += a[i+2]
@inbounds s4 += a[i+3]
i += 4
end
while i <= ilast
@inbounds s1 += a[i]
i += 1
end
return s1 + s2 + s3 + s4
end
end
Out[11]:
In [19]:
rand_arr = rand(500000);
# use ascending order will give better result
sorted_rand_arr = sort(rand_arr,rev=false)
# Julia's mapreduce
@printf("\nJulia's mapreduce:\n")
@time @inbounds ret = julia_sum(rand_arr,1,500000)
@time @inbounds sorted_ret = julia_sum(sorted_rand_arr, 1, 500000)
Although it seems compensated method takes much longer time than other methods, we still have to use these methods for high accuracy computing, like long term integration, simple loop will explode our accuracy at some time, but Kahan will never ruin our result too much. Following example shows the integration of function sin(x) over [0, Pi], using simple loop and Kahan both, we can see the error of simple loop at the last case begins to raise, which would generate a U-shape plot, while Kahan makes the error continue to decrease.
In [14]:
# Caution: this may takes time
f(x) = sin(x)
@time for i = 1:8
SIZE = 10^i
nodes = zeros(SIZE)
# most time spent on evaluating function as assignment
for j = 1:SIZE
@inbounds nodes[j] = sin(j*pi/SIZE)*pi/SIZE
end
ret_1 = simple_sum(nodes,1,SIZE)
ret_2 = kahan_sum(nodes)
# unsorted julia summation
ret_3 = julia_sum(nodes,1, SIZE)
# sorted julia summation
sort!(nodes, rev=false)
ret_4 = julia_sum(nodes,1,SIZE)
@printf("%8d\t%4.16lf \t%4.16lf\t%4.16lf\t%4.16lf\n",
i, log(abs(ret_1 - 2.0)), log(abs(ret_2 - 2.0)), log(abs(ret_3 - 2.0)),log(abs(ret_4 - 2.0)))
end
We can see the effect of Kahan method from above, when simple loop summation accumulates the error, kahan_sum still can make the curve goes down. For the comparison of sorted and unsorted Julia summation, the error curve can be flatten by sorting the data before operation on it.
There is no real random numbers in Julia, the pseudorandom generator uses Double precision SIMD-oriented Fast Mersenne Twister (dSFMT)(refer the paper ) which is described in base/dFSMT.jl, for those C++ users, probably they are more familiar with mt19937 and mt19937_64.
Julia also provides a randn function, which uses Ziggurat algorithm, it is a rejection sampling algorithm. For more detailed information about RNG, please refer Chapter Probability.